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Abstract. Consider a set of labels L and a set of trees T = {T (1) ,T (2) , . . . , T (fc) } where 
each tree is distinctly leaf-labeled by some subset of L. One fundamental problem 
is to find the biggest tree (denoted as supertree) to represent T which minimizes the 
disagreements with the trees in T under certain criteria. This problem finds applications 
in phylogenetics, database, and data mining. In this paper, we focus on two particular 
supertree problems, namely, the maximum agreement supertree problem (MASP) and the 
maximum compatible supertree problem (MCSP). These two problems are known to be 
NP-hard for k > 3. This paper gives the first polynomial time algorithms for both MASP 
and MCSP when both k and the maximum degree D of the trees are constant. 



1. Introduction 

Given a set of labels L and a set of unordered trees T = {TW, . . . , 7~W} where each 
tree T^) \ s distinctly leaf-labeled by some subset of L. The supertree method tries to find 
a tree to represent all trees in T which minimizes the possible conflicts in the input trees. 
The supertree method finds applications in phylogenetics, database, and data mining. For 
instance, in the Tree of Life project [ID], the supertree method is the basic tool to infer the 
phylogenetic tree of all species. 

Many supertree methods have been proposed in the literature [21 El Ej 8j. This paper 
focuses on two particular supertree methods, namely the Maximum Agreement Supertree 
(MASP) f8) and the Maximum Compatible Supertree (MCSP) [2]. Both methods try to 
find a consensus tree with the largest number of leaves which can represent all the trees in 
T under certain criteria. (Please read Section [2] for the formal definition.) 

MASP and MCSP are known to be NP-hard as they are the generalization of the 
Maximum Agreement Subtree problem (MAST) [H El [9] and the Maximum Compatible 
Subtree problem (MCT) [TJ [4] respectively. Jansson et al. [8] proved that MASP remains 
NP-hard even if every tree is a rooted triplet, i.e., a binary tree of 3 leaves. For k = 2, 
Jansson et al. [8] and Berry and Nicolas [2] proposed a linear time algorithm to transform 
MASP and MCSP for 2 input trees to MAST and MCT respectively. For k > 3, positive 
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Rooted 


Unrooted 


MASP for k trees of max degree D 


0{{kD) kD+ ' A (2n) k ) f 


0{(kD) kD+ ' s (4n) k ) f 


MCSP for k trees of max degree D 


0(2 2kD n k ) f 




MASP/MCSP for k binary trees 


0(k{2n 2 ) 3k2 ) [8] 
0(8 k n k ) [6] 
0(6 k n k ) f 





Table 1: Summary of previous and new results (| stands for new result). 



results for computing MASP/MCSP are reported only for rooted binary trees. Jansson et 
al. [8] gave an 0(fc(2n) 3fc ) time solution to this problem. Recently, Guillemot and Berry 
[6] further improve the running time to 0(8 k n k ). 

In general, the trees in T may not be binary nor rooted. Hence, Jansson et al. [H] 
posted an open problem and asked if MASP can be solved in polynomial time when k and 
the maximum degree of the trees in T are constant. This paper gives an affirmative answer 
to this question. We show that both MASP and MCSP can be solved in polynomial time 
when T contains constant number of bounded degree trees. For the special case where the 
trees in T are rooted binary trees, we show that both MASP and MCSP can be solved in 
0(6 k n k ) time, which improves the previous best result. Table Q] summarizes the previous 
and new results. 

The rest of the paper is organized as follows. Section [2] gives the formal definition of the 
problems. Then, Sections [3] and 0] describe the algorithms for solving MCSP for both rooted 
and unrooted cases. Finally, Sections [5] and [6] detail the algorithms for solving MASP for 
both rooted and unrooted cases. Proofs omitted due to space limitation will appear in the 
full version of this paper. 

2. Preliminary 

A phylogenetic tree is defined as an unordered and distinctly leaf-labeled tree. Given a 
phylogenetic tree T, the notation L(T) denotes the leaf set of T, and the size of T refers to 
|L(T)|. For any label set 5, the restriction of T to S, denoted T\S, is a phylogenetic tree 
obtained from T by removing all leaves in L(T) — S and then suppressing all internal nodes 
of degree two. (See Figure Q] for an example of restriction.) For two phylogenetic trees T 
and T", we say that T refines T', denoted T>T', if T" can be obtained by contracting some 
edges of T. (See Figure [1] for an example of refinement.) 

Maximum Compatible Supertree Problem: Consider a set of k phylogenetic trees 
T = {T^\ . . . ,T^}. A compatible supertree of T is a tree Y such that Y\L(T^) > 
T®\L(Y) for all i < k. The Maximum Compatible Supertree Problem (MCSP) is to find 
a compatible supertree with as many leaves as possible. Figure [2] shows an example of a 
compatible supertree Y of two rooted phylogenetic trees and . If all input trees have 
the same leaf sets, MCSP is referred as Maximum Compatible Subtree Problem (MCT). 

Maximum Agreement Supertree Problem: Consider a set of k phylogenetic trees 
T = {T^ l \ . . . , T( fc )}. An agreement supertree of T is a tree X such that X\L(T^) = 
T^\L(X) for all i < k. The Maximum Agreement Supertree Problem (MASP) is to find 
an agreement supertree with as many leaves as possible. Figure [2] shows an example of an 
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Figure 1: Three rooted trees. A tree T, a tree T' such that T' = T {a, c, d}, and a tree T" 
such that T" > T. 

agreement supertree X of two rooted phylogenetic trees and . If all input trees have 
the same leaf sets, MASP is referred as Maximum Agreement Subtree Problem (MAST). 




a of b 



Figure 2: An agreement supertree X and a compatible supertree Y of 2 rooted phylogenetic 
trees 7~W and T {2 \ 

In the following discussion, for the set of phylogenetic trees T = {T^\ . . . ,T^}, we 
denote n = \ U i=1 k anc i ^ stands for the maximum degree of the trees in T. We 

assume that none of the trees in T has an internal node of degree two, so that each tree 
contains at most n — 1 internal nodes. (If a tree has some internal nodes of degree two, 
we can replace it by | L(T^) in linear time.) 

3. Algorithm for MCSP of rooted trees 

Let T be a set of k rooted phylogenetic trees. This section presents a dynamic program- 
ming algorithm to compute the size of a maximum compatible supertree of T in O (2 2feZ) n fe ) 
time. The maximum compatible supertree can be obtained in the same asymptotic time 
bound by backtracking. 
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For every compatible supertree Y of T, there exists a binary tree that refines Y. This 
binary tree is also a compatible supertree of T, and is of the same size as Y . Hence in this 
section, every compatible supertree is implicitly assumed to be binary. 

Definition 3.1 (Cut-subtree). A cut-subtree of a tree T is either an empty tree or a tree 
obtained by first selecting some subtrees attached to the same internal node in T and then 
connecting those subtrees by a common root. 

Definition 3.2 (Cut-subforest). Given a set of k rooted (or unrooted) trees T, a cut- 
subforest of T is a set A = {*4 (1) , • . . ,A^}, where A® is a cut-subtree of TW and at least 
one element of A is not an empty tree. 




Figure 3: A cut-subforest A of T. 

For example, in Figure[3l {A^\A^} is a cut-subforest of {TW,T"}. Let O denote 
the set of all possible cut-subforests of T. 

Lemma 3.3. There are O {2 kD n k ^ different cut-subforests ofT. 

Proof. We claim that each tree 7"W contributes 2 D n or fewer cut-subtrees; therefore there 
are O (2 kD n k ^ cut-subforests of T. At each internal node v of T®, since the degree of v 
does not exceed D, we have at most 2 D ways of selecting the subtrees attached to v to 
form a cut-subtree. Including the empty tree, the number of cut-subtrees in Tv) cannot go 
beyond (n - 1)2 D + 1 < 2 D n. m 

Figure 2] demonstrates that a compatible supertree of some cut-subforest A of T may 
not be a compatible supertree of T. To circumvent this irregularity, we define embedded 
supertree as follows. 

Definition 3.4 (Embedded supertree). For any cut-subforest A of T, a tree Y is called an 
embedded supertree of A if Y is a compatible supertree of A, and L(Y) nL(TW) C L(A^) 
for all i < k. 

Note that a compatible supertree of T is also an embedded supertree of T. For each 
cut-subforest A of T, let mcsp(^l) denote the maximum size of embedded supertrees of A. 
Our aim is to compute mcsp(T). Below, we first define the recursive equation for comput- 
ing mcsp(^l) for all cut-subforests A G O. Then, we describe our dynamic programming 
algorithm. 

We partition the cut-subforests in O into two classes. A cut-subforest A of T is terminal 
if each element A® is either an empty tree or a leaf of T«; it is called non-terminal, 
otherwise. 
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Figure 4: Consider T = {T (1) ,T (2) } and its cut-subforest A = {A {1 \A {2) }. Although Z is 
a compatible supertree of .4, it is not a compatible supertree of T. The maximum 
compatible supertree of T is Y that contains only 2 leaves. 

For each terminal cut-subforest .4, let 

A(A) = {l£ [J L(A U) ) | Z £ L(T«) - L(A®) for i = 1, 2, . . . , k} . (3.1) 
j=i..fc 

For example, with T in Figure if A^ and A^ are leaves labeled by a and d respectively 
then A(„4) = {d}. In Lemma 13,5} we show that mcsp(_4) = |A(.A)|. 

Lemma 3.5. If A is a terminal cut-subforest then mcsp(.4) = |A(^4)|. 

Proof. Consider any embedded supertree Y of A. By Definition 13.41 every leaf of Y belongs 
to A(A). Hence the value mcsp(„4) does not exceed |A(^4)|. 

It remains to give an example of some embedded supertree of A whose leaf set is A(^4). 
Let C be a rooted caterpillar □ whose leaf set is A(A). The definition of A(_4) implies that 
L{C) n L(T®) C L(A®) for every i < k. Since each A^ has at most one leaf, it is 
straightforward that C is a compatible supertree of A. Hence C is the desired example. ■ 

Definition 3.6 (Bipartite). Let A be a cut-subforest of T. We say that the cut-subforests 
Al and Ar bipartition A if for every i < k, the trees A^ and A^ can be obtained by (1) 
partitioning the subtrees attached to the root of A^ into two sets and 5^ ; and (2) 
connecting the subtrees in (resp. S^) by a common root to form A^ (resp. A^S). 

Figure [5] shows an example of the preceding definition. For each non-terminal cut- 
subforest A, we compute mcsp(„4) based on the mcsp values of Al and Ar for each bipartite 
(Al,Ar) of A. More precisely, we prove that 

mcsp(^4) = max{mcsp(^li) + mcsp(^l^) | Al and Ar bipartition .4} . (3.2) 

The identity f|3.2|) is then established by Lemmas 13.81 and 13.101 

Lemma 3.7. Consider a bipartite (Al,Ar) of some cut-subforest A of T . IJYl and Yr 
are embedded supertrees of Al and Ar respectively then Y is an embedded supertree of A, 
where Y is formed by connecting Yl and Yr to a common root. 



A rooted caterpillar is a rooted, unordered, and distinctly leaf-labeled binary tree where every internal 
node has at least one child that is a leaf. 



366 



HOANG AND SUNG 




a b a c b d a b a c b d 



Figure 5: A bipartite (Al,Ar) of a cut-subforest A. The empty tree is represented by a 
white circle. 

Lemma 3.8. Let A be a cut-subforest ofT. If (Al, Ar) is a bipartite of A thenmcsp(A) > 
mcsp(AL) + mcsp(v4ij). 

Proof. Consider an embedded supertree Yl of Al such that |L(Yj,)| = mcsp(„4i). Define 
Yr for Ar similarly. Let Y be a tree formed by connecting Yl and Yr with a common root. 
Note that Y is of size mcsp(^t^) + mcsp(*4#). By Lemma 13.71 Y is an embedded supertree 
of A and hence the lemma follows. ■ 

Lemma 3.9. Given a cut-subforest A ofT, let Y be a binary embedded supertree of A with 
left subtree Yl and right subtree Yr. There exists a bipartite (Al,Ar) of A such that either 
(i) Y is an embedded supertree of Al; or (ii) Yl and Yr are embedded supertrees of Al and 
Ar respectively. 

Lemma 3.10. For each non-terminal cut-subforest A ofT, there exists a bipartite (Al,Ar) 
of A such that mcsp(*4) < mcsp(^4^) + mcsp(„4fj). 

Proof. Let Y be a binary embedded supertree of A such that |^(^)| = mcsp(.4). By 
Lemma 13.9} there exists a bipartite (Al,Ar) of A such that either (1) Y is an embedded 
supertree of Al', or (2) Yl and Yr are embedded supertrees of Al and Ar respectively, 
where Yl is the left subtree and Yr is the right subtree of Y. In both cases, |£(X)| < 
kicsp(Al) + nicsp(^4 j R). Then the lemma follows. ■ 

The above discussion then leads to Theorem 13.111 

Theorem 3.11. For every cut-subforest A ofT, the value mcsp(^l) equals to 

J |A(«4)| , if A is terminal, 

1 max{mcsp(^4^) + mcsp(^l/j) | Al and Ar bipartition A}, otherwise . 

We define an ordering of the cut-subforests in O as follows. For any cut-subforests 
A\,A% in O, we say that A\ is smaller than A2 if A± is a cut-subtree of A^ for i = 
1,2, ... ,k. Our algorithm enumerates A £ O in topological^/ increasing order and computes 
mcsp(^l) based on Theorem 13. Hi Theorem 13 . 1 2 1 states the complexity of our algorithm. 

Theorem 3.12. A maximum compatible supertree of k rooted phylogenetic trees can be 
obtained in O (2 2kD n k ) time . 
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Proof. Testing if a cut-subforest is terminal takes 0(k) times, and each terminal cut- 
subforest A then requires 0(k 2 ) time for the computation of A(A). In view of Lemma [3. 3 1 it 
suffices to show that each non-terminal cut-subforest A has 0(2 kD ) bipartites. This result 
follows from the fact that for each i < k, there are at most 2 D ways to partition the set of 
the subtrees attached to the root of A^ . m 

In the special case where every tree is binary, Theorem 13.131 shows that our algo- 
rithm actually has a better time complexity. Note that the concepts of agreement supertree 
and compatible supertree will coincide for binary trees. Hence, our algorithm improves the 
O (8 fc n fc )-time algorithm in [6] for computing maximum agreement supertree of k rooted 
binary trees. 

Theorem 3.13. If every tree in T is binary, a maximum compatible supertree (or a maxi- 
mum agreement supertree) can be computed in O (6 fc n fc ) time. 

Proof. We claim that the processing of non-terminal cut-subforests of T requires O (6 fc n fc ) 
time. The argument in the proof of Theorem 13.121 tells that the remaining computation 
runs within the same asymptotic time bound. Consider an integer r € {0, 1, . . . , k}. We 
shall be dealing with a cut-subforest A such that there are exactly r cut-subtrees A^ whose 
roots are internal nodes of 7"W. The key of this proof is to show that the number of those 

cut-subforests does not exceed ( ^ ] (n— l) r (n + l) k ~ r , and the running time for each cut- 



E 



r 

subforest is O (4 r 2 fc-r ) . Hence, the total running time for all non-terminal cut-subforests 
is 

k , 

( l ) (" - l) r (" + lf~ r O (^4 r 2 fc - r J = O (6*n" 

r=0 ^ ' 

We can count the number of the specified cut-subforests A as follows. First there are 
I options for r indices i such that the roots of cut-subtrees are internal nodes of 

3~W. For those cut-subtrees, we then appoint one of the (n — 1) or fewer internal nodes of 
to be the root node of A®. Every other cut-subtree of A is a leaf or the empty tree, 
and then can be determined from at most n + 1 alternatives. Multiplying those possibilities 
gives us the bound stipulated in the preceding paragraph. 

It remains to estimate the running time for each specified cut-subforest A. This task 
requires us to bound the number of bipartites of each cut-subforest. If the root v of A® is 
an internal node of 7~W then A^ contributes 4 or fewer ways of partitioning the set of the 
subtrees attached to v. Otherwise, we have at most 2 ways of partitioning this set. Hence 
A owns at most 4 r 2 fc ~ r bipartites, and this completes the proof. ■ 



4. Algorithm for MCSP of unrooted trees 

Let T be a set of k unrooted phylogenetic trees. This section extends the algorithm 
in Section [3] to find the size of a maximum compatible supertree of T. The maximum 
compatible supertree can be obtained by backtracking. Surprisingly, the extended algorithm 
for unrooted trees runs within the same asymptotic time bound as the original algorithm 
for rooted trees. 
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We will follow the same approach as Section El i.e., for each cut-subforest A of T, we 
find an embedded supertree of A of maximum size. Definitions 13.11 13.21 and 13.41 for cut- 
subforest and embedded supertree in the previous section are still valid for unrooted trees. 
Notice that although T is the set of unrooted trees, each cut-subforest A of T consists of 
rooted trees. (See Figure [6] for an example of cut-subforest for unrooted trees.) Hence we 
can use the algorithm in Section [3] to find the maximum embedded supertree of A. We then 
select the biggest tree T among those maximum embedded supertrees for all cut-subforests 
of T, and unroot T to obtain the maximum compatible supertree of T. 




a b 



Figure 6: The set of rooted trees A = {A^,A^} is a cut-subforest of T = {T^,T^}. 

Theorem 14.11 shows that the extended algorithm has the same asymptotic time bound 
as the algorithm in Section [3l 

Theorem 4.1. We can find a maximum compatible supertree of k unrooted phylogenetic 
trees in O (2 2kD n k ) time. 

Proof. Using a similar proof as Lemma 13.31 we can prove that there are O (2 kD n k ) cut- 
subforests of T. As given in the proof of Theorem 13.121 finding the maximum embedded 
supertrees of each cut-subforest takes 0(2 kD ) time. Hence the extended algorithm runs 
within the specified time bound. ■ 

5. Algorithm for MASP of rooted trees 

Let T be a set of k rooted phylogenetic trees. This section presents a dynamic pro- 
gramming algorithm to compute the size of a maximum agreement supertree of T in 
O ((/c-D) fcjD+3 (2ra) fc ) time. The maximum agreement supertree can be obtained in the same 
asymptotic time bound by backtracking. 

The idea here is similar to that of Section [3l However, while we can assume that 
compatible supertrees are binary, the maximum degree of agreement supertrees can grow 
up to kD. It is the reason why we have the factor 0((kD) kD+3 ) in the complexity. 

Definition 5.1 (Sub-forest). Given a set of k rooted trees T, a sub-forest of T is a set 
A = {y4^, . . . , A^ k '}, where each A^ is either an empty tree or a complete subtree rooted 
at some node of 7~W, and at least one element of A is not an empty tree. 

Notice that the definition of sub-forest does not coincide with the concept of cut- 
subforest in Definition 13.21 of Section [3j For example, the cut-subforest A in Figure [3] 
is not a sub- forest of T, because A^ is not a complete subtree rooted at some node of . 
Let O denote the set of all possible sub-forests of T. Then \0\ = O ((2n) fc ). 
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Definition 5.2 (Enclosed supertree). For any sub- forest A of T, a tree X is called an 
enclosed supertree of A if X is an agreement supertree of A, and L(X) n L(7~W) C L(.4.W) 
for all i < k. 

For each sub-forest ^4 of T, let masp(^4) denote the maximum size of enclosed supertrees 
of A. We use a similar approach as Section EJ i.e., we compute masp(^l) for all A G O, and 
masp(T) is the size of a maximum agreement supertree of T. We partition the sub-forests 
in O to two classes. A sub-forest A is terminal if each A^' is either an empty tree or a leaf. 
Otherwise, A is called non-terminal. 

Notice that for terminal sub-forest, the definition of enclosed supertree coincides with 
the concept of embedded supertree in Definition 13.41 of Section [31 Then by Lemma 13.51 
we have masp(^l) = |A(^4)|. (Please refer to the formula (13. ip in the paragraph preceding 
Lemma 13.51 for the definition of function A.) 

Definition 5.3 (Decomposition). Let A be a sub-forest of T. We say that sub- forests 
Bi, . . . ,Bd (with al > 2) decompose A if for all i < k, either (i) Exactly one of Bf,...,Bf 
is isomorphic to A.® while the others are empty trees; or (ii) There are at least 2 nonempty 
trees in B^ , . . . , B$ , and all those nonempty trees are isomorphic to pairwise distinct 
subtrees attached to the root of A^' ■ 




Figure 7: A decomposition (£>i, £>2, $3) of a sub-forest A. The empty trees are represented 
by white circles. 

Figure [7] illustrates the concept of decomposition. For each sub-forest A of T, we will 
prove that 

masp(*4) = max{masp(£>i) + . . . + masp(^) | B±, . . . ,Bd decompose „4} . (5-1) 
The identity (|5.ip is then established by Lemmas 15.51 and 15.71 

Lemma 5.4. Suppose (B\, . . . ,Bd) is a decomposition of some sub-forest A of T. Let 
T\, . . . ,Td be some enclosed supertrees of B%, . . . ,Bd respectively, and let X be the tree ob- 
tained by connecting Ti,...,Td to a common root. Then, X is an enclosed supertree of A. 

Lemma 5.5. // (B\, . . . ,Bd) is a decomposition of a sub-forest A of T then masp(^4) > 
masp(£>i) + . . . + masp(Bd). 

Proof. For each Bj, let Tj be an enclosed supertree of Bj such that (-^(tj)! = masp(£>j). Let 
X be the tree obtained by connecting t%, . . . , Td to a common root. By Lemma [5.41 X is an 
enclosed supertree of A. Hence |£(ti)| + . . . + |£(Td)| = \L(X)\ < masp(^l). ■ 
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Lemma 5.6. Let X be an enclosed supertree of some sub-forest A ofT, and let ri, . . . be 
all subtrees attached to the root of X. Then either (i) There is a decomposition (B\,B2) of 
A such that X is an enclosed supertree of B\; or (ii) There is a decomposition (B\, . . . ,Bd) 
of A such that each Tj is an enclosed supertree of Bj . 

Lemma 5.7. For each non-terminal sub-forest A ofT, there is a decomposition (B\, . . . , Bj) 
of A such that masp(*4) < masp(i?i) + . . . + masp(£>d) 

Proof. Let X be an enclosed supertree of A such that |L(-X")| = masp(*4) and let t\, . . . ,T<f be 
all subtrees attached to the root of X. By Lemma [5.6| either (i) There exists a decomposition 
(£>i,£>2) of A such that X is an enclosed supertree of B±; or (ii) There is a decomposition 
(£>i, . . . , Bd) of A such that each Tj is an enclosed supertree of Bj. In case (i), we have 
|£(-^0I < masp(£>i) < masp(£>i) + masp(^2)- On the other hand, in case (ii), we have 
\L(X)\ = \L( n )\ + ... + \L(r d )\ <masp(jBi) + ...+masp( J B e ,). ■ 

The above discussion then leads to Theorem 15.81 

Theorem 5.8. For every sub-forest A ofT, the value masp(.4) equals to 

j |A(.A)| , if A is terminal, 

1 max{masp(^ 1 ) + . . . + masp(Bd) \ B\, . . . ,Bd decompose A}, otherwise . 

We define an ordering of the sub-forests in O as follows. For any sub- forests A\,Ai 
in O, we say A\ is smaller than A2 if A\ is either an empty tree or a subtree of An 
for i = 1,2, ... ,k. Our algorithm enumerates A € O in topologically increasing order and 
computes masp(^l) based on Theorem 15.81 

In Lemma 15.91 we bound the number of decompositions of each sub-forest of T. Theo- 
rem (5T0] states the complexity of the algorithm. 

Lemma 5.9. Each sub-forest ofT has O ((kD) kD+1 ^j decompositions, and generating those 
decompositions takes O (/c 2 D 2 ) time per decomposition. 

Proof. Let A be a sub-forest of T. Since the maximum degree of any agreement supertree of 
A is bounded by kD, we consider only decompositions that consist of at most kD elements. 
We claim that for each d € {2, . . . , kD}, the sub-forest A owns 0((d + 2) kD ) decomposit 10ns 
(B\, . . . , Bd). Summing up those asymptotic terms gives us the specified bound. 

The key of this proof is to prove that for each s G {1, . . . , k}, the tree A^ contributes 

at most (d + 1) D + d < (d + 2) D sequences b[ , . . . , B d s \ and generating those sequences 
requires 0(d) time per sequence. We have two cases, each corresponds to a type of the 
above sequence. 

Case 1: One term in the sequence is A^; therefore the other terms are empty trees. 
Then, we can generate this sequence by assigning .4( s ) to exactly one term and setting the 
rest to be empty trees. This case provides exactly d sequences and enumerates them in 
0(d) time per sequence. 

Case 2: No term in the above sequence is A^ s \ Consider an integer r € {0, 1, . . . ,d} 
and assume that the sequence consists of exactly r terms that are nonempty nodes. Then 
those r nonempty trees are isomorphic to pairwise distinct subtrees attached to the root of 
A( s) . Let 5 be the de gree of the root of A^ . We generate the sequence as follows. First 
we draw r pairwise distinct subtrees attached to the root of ^4^ s ^. Next, we select r terms 
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in the sequence and distribute the above subtrees to them. Finally we set the remaining 
terms to be empty trees. Hence this case gives at most 

£ (l)j^.<t( D r )--^^ 

r<mm{5,d} V 7 V ' r=0 V 7 

sequences, and generates them in 0(d) time per sequence. ■ 

Theorem 5.10. A maximum agreement supertree of k rooted phylogenetic trees can be 
obtained in O ((kD) kD+3 ; (2ra) fc ) time. 

Proof. Testing if a sub-forest is terminal takes 0{k) times, and each terminal sub-forest A 
then requires 0(k 2 ) time for computing A(^4). By Lemma l5.9t each non-terminal sub-forest 
requires O ((fcD) fc£)+3 ) running time. Summing up those asymptotic terms for O ((2n) fc ) 
sub-forests of T gives us the specified time bound. ■ 



6. Algorithm for MASP of unrooted trees 

Let T be a set of k unrooted phylogenetic trees. This section extends the algorithm in 
Section [5] to find the size of a maximum agreement supertree of T in O ( y {kD) kD+3 {An) k ^ 
time. The maximum agreement supertree can be obtained by backtracking. 

We say that a set of k rooted trees T = ■ ■ ■ ,J-^ k '} is a rooted variant of T if we 

can obtain each J-^' by rooting at some internal node. One naive approach is to use 
the algorithm in the previous section to solve MASP for each rooted variant of T. Each 
rooted variant then gives us a solution, and the maximum of those solutions is the size of 
a maximum agreement supertree of T. Because there are O (n fc ) rooted variants of T, this 
approach adds an O factor to the complexity of the algorithm for rooted trees. 

We now show how to improve the above naive algorithm. As mentioned in the previous 
section, the computation of each rooted variant of T consists of O ((2n) fc ) sub-problems 
which correspond to its sub-forests. (Please refer to Definition 15.11 for the concept of sub- 
forest.) Since different rooted variants may have some common sub- forests, the total number 
of sub-problems we have to run is much smaller than 0(2 k n 2k ). More precisely, we will show 
that the total number of sub-problems is only O ((4n) fe ). 

A (rooted or unrooted) tree is trivial if it is a leaf or an empty tree. A maximal subtree 
of an unrooted tree T is a rooted tree obtained by first rooting T at some internal node v 
and then removing at most one nontrivial subtree attached to v. Let O denote the set of 
sub-forests of all rooted variants of T. 

Lemma 6.1. Let A = {A {1 \ . . . , A^} be a set of rooted trees. Then A G O if and only if 
each A^ is either a trivial subtree or a maximal subtree of TW . 

Proof. Let J 7 be a rooted variant of T such that A is a sub-forest of T . Fix an index 
s S {1, . . . , k} and let v be the root node of A^ s '. Our claim is straightforward if either „4,( s ) 
is trivial or v is the root node of J-^ . Otherwise, let u be the parent of v in T^. Hence 
A^ s ' is the maximal subtree of T^ s ' obtained by first rooting at v and then removing 
the complete subtree rooted at u. 

Conversely, we construct a rooted variant J- of T such that A is a sub-forest of J- as 
follows. For each i < k, if A.® is trivial or A^ is a tree obtained by rooting at some 
internal node then constructing J-^' is straightforward. Otherwise A^' is a maximal subtree 
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of obtained by first rooting T^' at some internal node v and then removing exactly 
one nontrivial subtree r attached to v. Hence is the tree obtained by rooting 7"W at 
u, where u is the root of r. ■ 

Theorem 6.2. We can find a maximum agreement supertree of k unrooted phylogenetic 
trees in O ((kD) hD+3 (4n) k ) time. 

Proof. The key of this proof is to show that each tree Tw contributes at most (3n — 1) 
maximal subtrees. It follows that \0\ < (4n) fc . The specified running time of our algorithm 
is then straightforward because each subproblem requires O ((kD) kD+3 ^) time as given in 
the proof of Theorem 15.101 Assume that the tree 2~W has exactly L leaves, with L < n. 
We now count the number of maximal subtrees T of 7~W in two cases. 

Case 1: T is obtained by rooting 7~W at some internal node. Hence this case provides 
at most L — 1 < n maximal subtrees. 

Case 2: T is obtained by first rooting 7"w at some internal node v and then removing a 
nontrivial subtree r attached to v. Notice that there is a one-to-one correspondence between 
the tree T and the directed edge (v,u) of 7~w, where u is the root node of r. There are 
2L — 2 or fewer undirected edges in 7~W but exactly L of them are adjacent to the leaves. 
Hence this case gives us at most 2(2L — 2 — L) < 2n — 1 maximal subtrees. ■ 
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